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Causal non-locality can arise from constrained replication 
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The fundamental theories of physics are local theories, depending on local interactions of local 
variables. It is not clear if and how strictly local theories can produce non-local variables that 
have causal effectiveness. Yet, non-local effectiveness appears to exist, such as in the form of 
memory (non-locality through time) and causally effective spatial structures (non-locality through 
space). Here it is shown, by construction, how such non-locality can be produced from elementary 
components: non-isolated systems, multiplicative noise, self-replication, and elimination. A theory 
is derived that explains how causal non-locality can arise from strictly local interactions. 
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I. INTRODUCTION 

The theories that form the foundation of physics, quan¬ 
tum field theory and general relativity, are local theo¬ 
ries [lj. They describe the evolution of local field vari¬ 
ables in terms of local interactions in space-time. Such 
locality is consistent with the empirical facts that phys¬ 
ical systems flow contiguously through time and that 
causal influences cannot travel faster than the speed of 
light. Nevertheless, local theories are often formulated 
as non-local ones with non-local variables, if that is con¬ 
venient for understanding and calculation. For exam¬ 
ple, finding the dynamics of a system from the princi¬ 
ple of least action requires non-local trajectories. Simi¬ 
larly, Maxwell’s equations in local, differential form, e.g. 
V • E = p/e o, can be formulated in non-local, integral 
form, e.g. £ s E-da= f v pdV/eo- Whereas the first form 
is purely defined locally, the second form equates non¬ 
local quantities obtained by integrating over a non-local 
surface and a non-local volume. 

Although non-local formulations are fully equivalent, 
mathematically, to the corresponding local ones, they 
are different in the way they map formalism to physi¬ 
cal reality. Physical reality is taken to arise from local 
interactions. Therefore, only local variables are causally 
effective in the sense that they refer to quantities directly 
involved in interactions that produce change. In contrast, 
quantities denoted by non-local variables do not directly 
interact. They are not directly causally effective them¬ 
selves. Non-local theories using non-local variables, such 
as volume and entropy, are often the most natural way 
to understand a system. But they are taken to be com¬ 
pletely explainable from a combination of local causal 
interactions, at least in principle. 

However, there are clear cases, particularly in the 
realm of life and technology, where non-local variables 
do seem to have direct causal effectiveness. For example, 
memory in the form of DNA is a causal factor that ap¬ 
pears to act non-locally through time, a spider’s web is a 


non-local spatial structure with causal effectiveness, and 
also the cylinder and piston of a steam engine only work 
because of their highly specific spatial structure. The 
question then arises how non-local variables or structures 
can get causal effectiveness if all foundational theories are 
strictly local. Locality seems like a conserved property. 
In a complex system the interactions may become com¬ 
plex and may strongly vary across space and time, but 
those interactions would still be local. Yet, in this arti¬ 
cle I show, by construction, that non-locality with causal 
effectiveness can indeed arise from local interactions. Lo¬ 
cal interactions are given in terms of local variables or in 
terms of non-local variables that are completely defined 
by a combination of local causal interactions. Such a 
defining combination does not exist if a non-local vari¬ 
able has causal effectiveness of its own. 

Before proceeding, a disclaimer is necessary. Non¬ 
locality is also studied in the context of quantum en¬ 
tanglement and Bell’s theorem. But such non-locality 
concerns correlation rather than causation, and the cor¬ 
relations are fully explained by a local theory Q • Quan¬ 
tum non-locality is not the topic of this article. 

The construction explained below is simplified as much 
as possible. It should be seen as a mere proof of concept, 
a stylized version of more elaborate actual systems. The 
construction proceeds through the following steps. It as¬ 
sumes a population of non-isolated systems that are per¬ 
turbed by external disturbances. The systems have a lim¬ 
ited lifetime and are autocatalytic, that is, can replicate. 
Replication rates differ between different types of sys¬ 
tems, which means that systems with quickly increasing 
rates will dominate the population. How strongly exter¬ 
nal disturbances can perturb each system is assumed to 
depend on the system’s structure and momentary state. 
The form of this dependence that is optimal for replica¬ 
tion is derived. This form turns out to depend in a simple 
way on the replication rate itself. Systems will therefore 
maximize their abundance in the population if they use 
an approximation of this rate for modulating their vari- 


2 


ability. Whereas the real replication rate is a non-local 
variable without direct causal effectiveness within a sys¬ 
tem, the approximated replication rate has causal effec¬ 
tiveness through local interactions within that system. 
In effect, the coupling of these rates provides a non-local 
variable with causal effectiveness. The next section de¬ 
rives these results in detail. 


II. THEORY 

We assume non-isolated systems with a dynamical 
structure s. The systems are capable of self-replication. 
Systems have a small probability per unit of time to 
change structure as s —> s', with s' a small random vari¬ 
ation on s. The structural space through which s can 
move is undefined. Systems have a typical lifetime r and 
a time-varying growth rate k s (t), with their number n s (t) 
given by 


dris/dt = k s {t)n s (t), (1) 

with n s > 0; when n s = 0, systems of type s have become 
extinct. Equation © produces exponential growth when 
k s (t) > 0, exponential decline when k s (t) < 0, and stable 
numbers when k a (t ) = 0. The growth rate is assumed to 
depend on the distance between two real-valued scalars, 
E(t) and x s (t). Here E(t) is an environmental variable 
(written as E t below), and x s {t) a state variable of the 
system. Then 


k s {x s ,t) — k s (x s E/^), (2) 

with k s maximal at x s = E t and monotonically decreas¬ 
ing to — 1/r for large \x s — E t \. The latter corresponds 
to exponential decline when there is no replication. The 
growth rate thus depends on how well the system state 
matches the environment. Unlimited growth is prevented 
by letting k s decrease uniformly for all systems such that 
the total number of systems N (t) = J2 s ns constrained 
to a given constant Nq. Nq can be thought to depend on 
a limited availability of raw materials, free energy, and 
space. Then N(t) = No yields 


The environmental variable E t is assumed to vary un¬ 
predictable with power distributed across many time 
scales, both smaller and larger than r m It can be 
thought to arise from a random walk-like process, but 
band-limited and with a non-uniform, typically power- 
law spectral density (like coloured noise, [5]; E t is not as¬ 
sumed to be zero-mean, but its time derivative is). The 
process generating E t is taken to be independent of the 
other random processes, in particular the process gener¬ 
ating new systems s including their o s (see below) and 
the Wiener process W t (see below). Independence is in¬ 
terpreted here as the assumption that the processes are 
in no way causally related. 

The state variable x s of a system s is assumed to 
evolve according to a random walk with state- and time- 
dependent drift and diffusion 

dx s (t) = /i s (x s ,t)dt + <J s (x s ,t)dW t , (4) 


with a deterministic part in the form of a drift /i Sl and 
a stochastic part in the form of a Wiener process, with 
d Wt a zero-mean Gaussian white noise. The noise is 
multiplicative through cr s . Both ^i s and cr s are produced 
within system s. They are structural properties of the 
system that can change along with the system’s struc¬ 
ture, with small random variations. Structural changes 
are assumed to be independent of the noise dW t - Both 
are taken to arise from disturbances of the system. Such 
disturbances may come directly from thermal and quan¬ 
tum noise, and indirectly from long-range electromag¬ 
netic and gravitational fluctuations. 

In order to simplify the notation, the subscript s is not 
written below. Equation © is an Ito process @] that 
becomes another Ito process when transformed through 
a function of x and t (Ito’s lemma). For the growth rate 
k(x,t) this produces 


„ dk dk 1 2 d 2 k dk 1Trr 

dt = at dt + "S 41 + r w dt + °di AWt - 

Using eq. © and rearranging terms then gives 
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„ dk , 
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dk dk dE t 

a^^dW t — ———df. (6) 


dx 


dx dt 


dN(t)/dt = d n s /dt = k s (t)n s (t) = 0. (3) 

S S 

Because n s (t) > 0 for all systems that have not become 
extinct, the rightmost equality implies that k s (t) must 
vary around zero, on average. Variations in E t and the 
introduction of new variants s will occasionally drive k s 
downwards. Systems that can recover quickly from such 
decreases by having a large dk s /dt will then gradually 
replace systems with smaller dk s /dt. Systems can there¬ 
fore maximize the likelihood that their type s persists 
by maximizing dk s [t)/dt rather than k s (t) itself. This 
maximization must be constrained by the condition that 
systems s do not become extinct. Below we will derive 
conditions for such a constrained maximization. 


The first two terms represent drifts, one produced by /r 
and the other produced by the net effect on k of noisy 
variations along x when k as a function of x is curved 
( d 2 k/dx 2 1 0). The last two terms in eq. © are noisy, 
one produced by the Wiener process and the other by un¬ 
predictable changes in the environment. As stated above, 
if a system is to survive amongst other systems, it should 
maximize its expected dk without becoming extinct. Be¬ 
low we will simplify the analysis by taking /j, = 0. 

The two noisy terms are equally likely positive or neg¬ 
ative, with zero mean. Thus maximizing the expected 
dk implies maximizing the drift term with a 2 . However, 
just maximizing this term through a 2 would also increase 
the noise term depending on a. Large noisy variations 
increase the probability that dk becomes negative for an 
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extended time, and thereby increase the likelihood that 
the system’s type will become extinct. Therefore, the 
variance v a of this noise term needs to be constrained. 
But it should not be very different from the variance 
of the last term, ve, which depends on E t but not on 
a. Making v a much smaller than ve would increase the 
probability of extinction, because then a and thus the 
drift term would be small, whereas the noise would be 
nearly constant (almost completely determined by E t ). 
On the other hand, making v a much larger than ve would 
make E t irrelevant for the dynamics. This would conflict 
with the basic assumption of the construction here that 
variations in E t partly drive the systems’ dynamics. 

The relevant time scale for comparing the drift and 
noise terms is the system’s lifetime r. Through eq. 0 the 
growth rate k depends on z = x— E t . The integrals below 
will be limited to a range \—Z 1 Z) of z such that beyond 
this range the partial derivatives of k are sufficiently small 
to be neglected, that is, dk/dz ~ 0 and d 2 k/dz 2 ~ 0 for 
1 2 1 > Z. Because E t is assumed to be a random walk-like 
process, it drifts along the z-axis. The range of z it can 
reach is limited because there is no replication for large 
|z|, but that range is assumed here to be much larger 
than [-Z, Z\. We will therefore assume that the expected 
values of z produced by E t in a time r are distributed 
uniformly, at least approximately, over the range [-Z, Z\. 

With these simplifying assumptions, constraining the 
expected noise variance over the system’s lifetime r re¬ 
quires 


T 

2 ~Z 




-z 

where (dVH 2 ) = dt was used Q, and K 
constant such that 
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is a positive 


K 
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Here <Je( t ) is the expected variance of E t in a time r, 
which depends on the details of E t . Equation (0 im¬ 
plements the condition discussed above that the noise 
arising from E t should neither dominate nor be negligi¬ 
ble. However, the precise value of K is not important 
for the argument below. We can now find the cr(z) that 
maximizes the expected drift in time r 
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under the constraint of eq. 0. This is an example of 
an isoperimetric problem that can be solved with the 
method of Lagrange multipliers Q. Writing g(z) = cr 2 , 
h(z) = dk/dz , and h'(z) = dh/dz, then an extremum of 
J given constraint I\ implies an extremum of the func¬ 
tional F 

F(g, h , ti) = i g{z)h'(z ) - A g{z)h 2 (z), (10) 


with A a Lagrange multiplier. Whereas we are interested 
in finding the function g that maximizes F for a given 
h, we will first find the function h that maximizes F 
for a given g. This will result in a simple, invertible 
relationship between g and k, which subsequently also 
solves the problem of finding g given h. The assumption 
here is that all functions involved are sufficiently smooth, 
in particular that F varies smoothly for small variations 
Sh and Sg. From the Euler-Lagrange equation 


we find 


This gives 


d (dF\ dF 
dz \dh') dh~ 

(ID 

d fKiX g (z)h(z) = 0. 
dz 

(12) 

g(z) = g 0 e - 4Xk( *>, 

(13) 


where h(z) = dk/dz was used and g 0 is a constant. The 
parameters go and A in eq. m can be found numeri¬ 
cally from eq. 0. They depend on the detailed form of 
k(z), which is constrained by eq. 0 . If solutions exist 
for given parameters, there is a range of possible values 
(go, A). The largest value of A gives the largest J, be¬ 
cause it can be shown that J = 2XK. This follows from 
using eq. 01 for expressing h and h' in terms of g and 
substituting in the equations for J and K. But A cannot 
be chosen freely, because there is a further constraint on 
g = a 2 . The latter is the instantaneous variance of x, 
because eq. 0 implies (dx 2 ) = <r 2 d£. This variance is 
not thermal but actively driven, somewhat analogous to 
that in active matter [8j. Driving the variance consumes 
a proportional amount of free energy per unit of time. 
The system must acquire this free energy from its envi¬ 
ronment. How much is available for varying x depends 
on the availability of free energy in the environment, on 
evolved acquisition mechanisms within the system, and 
on how much free energy the system needs for other pro¬ 
cesses. We assume here that the result of these factors 
varies much slower than x and E t , and is effectively inde¬ 
pendent of them. The rate of available free energy is then 
effectively a constant that constrains g{z), and thereby 
A. 

Quite remarkably, eq. Cl shows that the a in dx 
(eq. 0 that maximizes d k (eq. 0 is an explicit and very 
simple function of k, with a 2 oc l/exp(4A/c). Here a 2 
only depends on z through k and only depends on t 
through z. Thus the instantaneous variance is inversely 
related to the instantaneous growth rate. Intuitively, this 
result can be understood as follows. When the growth 
rate is larger than zero, the contribution of system s to 
the population is increasing, and little change in its state 
is needed. But when the growth rate is smaller than 
zero, the numbers of system s are declining. If nothing 
is changed, the system may become extinct. With an in¬ 
creased variance, the state varies faster, which increases 
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the probability that a state with positive growth rate is 
encountered. If that happens, the variance is decreased 
automatically, which results in maintained growth, at 
least until changes in environment or population require 
further change. Another way to view this mechanism is 
as a controlled diffusion process. The systems s quickly 
diffuse away from areas of the state space that have a 
low growth rate, and much slower away from areas with 
a high growth rate. In effect, they accumulate in areas 
with high growth. The efflux from those areas is com¬ 
pensated by a continuous influx of new copies of system 
s produced by self-replication. 

Although the optimal solution is a 2 oc l/exp(4Afc), it 
could not be literally realized in the system. Whereas 
a is a property of the system (eq. 0]), A; is the growth 
rate in eq. CD- The growth rate is a non-local variable 
that is not available to the system in a direct way. The 
system has no way to measure it directly and instantly. 
The system can therefore at best approximate k as an 
internally produced estimate fc. The a s of eq. 0 is then 
a function of k s and not of k s . The estimate k s can 
gradually evolve and improve in new, random variants 
of system s, because it is advantageous for replication. 
Only factors to which the system has direct access may be 
included in k. For example, the system may get sensors 
that give information on the state of E t relative to its own 
state. Systems that produce a fc that estimates k better 
will have a a 2 oc 1/exp(4Afc) that is closer to the optimal 
solution. They will therefore have an expected dfc that 
is larger than that of other systems. The population will 
thus gradually become dominated by systems that have 
adequate fc. 

The reason why k needs not equal k exactly, is that 
variations around the optimal k will still produce a near- 
optimal drift J. This follows from the smoothness as¬ 
sumption of the variational approach taken here (eq. [10] 
and below). A variation of k around the optimum, fc, 
produces a variation of a and therefore a variation Sg, 
which subsequently produces a small change in F and 
therefore in J as well. Thus J remains close to its opti¬ 
mum. The sensitivity of a to variations in k depends on 
A. This is a further reason to constrain A, depending on 
how accurately k estimates k. 

It should be noted that there is no circular logic in 
the theory developed here. The derivation assumes that 
eq. © follows from eq. 0, and thus that a is not an 
explicit function of k. This assumption seems to conflict 
with eq. m which has a as a literal function of k. But 
the assumption is correct when taking a as a function 
of fc. Varying k , as in dfc, does not affect fc instantly. 
Because k cannot estimate k with zero lag, dfc and dfc 
are independent locally in time. Therefore, eq. © still 
follows from eq. 0- Estimation with non-zero lag is 
possible, because k is autocorrelated across many time 
scales. The latter property follows from eq. 0 and the 
fact that E t is autocorrelated in that way. Also the struc¬ 
tural forms of k and er cannot change instantly, but only 


as a result of further evolution of system s, with some lag. 
The actual optimization occurs gradually in real systems. 
It is therefore cyclical, involving time delays as in a feed¬ 
back loop, not circular. The theoretical derivation from 
eq. © to eq. (O just produces a time-averaged short¬ 
cut to the ideal end-point of the actual optimization. The 
result should be seen as an unreachable limit. It seems 
circular merely because the optimization is static in the 
theory, whereas it is dynamic and approximate in actual 
systems. 

As an illustration of the theory, we can take k(z) = 
fco exp(— z 2 /2) — 1/t, t = 1, Z = 4, and K = 1. In ac¬ 
cordance with eq. ©, this function assumes a maximum 
growth rate for z = x — E t = 0, thus when x matches E t . 
When the match is poor, for large \z\, there is no repli¬ 
cation and n declines exponentially. For simplicity, we 
assume here that the system has evolved a close approx¬ 
imation of k. The system thus uses er(fc) with k ss fc. For 
example, fc may be based on an approximation of eq. © 
with E t - rather than E t , where E t - is measured by the 
system at a time t~ slightly before t. The resulting distri¬ 
bution of n(z) depends on the details of E t and could only 
be obtained through numerical simulation. In order to 
get an idea of the order of magnitude of the variables in¬ 
volved, we may assume for this example that E t is chosen 
such that n(z ) is approximately distributed uniformly in 
[— Z, Z). Then f dz fc(z) = 0 (from eq.[3|) gives fco = 3.19. 
Solutions of eq. © then exist for go in the range 0 to 1.43, 
and A > 0.35. With g the mean of g(z) in [—Z,Z], an 
energy constraint g = 10 gives go = 0.76 and A = 0.87, 
with J = 1.73, that is, a drift 1.73 times the standard 
deviation of the noise, A' 1 / 2 . J increases monotonically 
with g. Systems that are more effective in harvesting en¬ 
vironmental energy therefore have an advantage. Quali¬ 
tatively similar results were obtained with another func¬ 
tional form for the growth rate, k(z) = fco/(l + z 2 ) — 1/t. 

The actual fc and the estimated k have quite different 
properties with respect to locality. The variable fc is a 
non-local variable of the non-local theory represented by 
eq. ©. The variable is non-local, because it describes 
the overall effect of a potentially large range of local fac¬ 
tors, including stochastic ones. Together these factors 
produce the growth rate of a system, and they are re¬ 
lated to fc in an indirect way. But this is not different, in 
principle, from how the integral form is related to the lo¬ 
cal form of Maxwell’s equations. They are related merely 
through a well-defined, possibly complex transformation. 
In contrast, the variable fc is rather special. Although 
it is directly defined by strictly local interactions within 
the system, it produces, in addition, a correlation with k. 
Correlation means here that the zero-lag cross-correlation 
between k s (t) and k s (t) is positive, E[k s (t)k s (t)] > 0. 
This correlation is not produced by instantaneous vari¬ 
ations of fc s (f) and k s (t), because dfc s and dfc s are in¬ 
dependent. Rather, it is produced by slower changes in 
k s (t) in response to changes in k s (t). As stated above, 
these slower changes are effective because k s (t) is auto- 
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correlated across many time scales. 

The correlation between k and k only exists because 
system variants with less or no correlation have become 
extinct. No transformation between k and k exists. Yet 
k is effective in maximizing d k precisely because it has 
been driven, through competition between different sys¬ 
tem types, to approximate k. In effect, k tracks k. Part 
of the causal effectiveness of k, as promoting system sur¬ 
vival, arises from the fact that it tracks k. Therefore, 
the causally effective variable k has a non-local scope, 
through k. Equivalently, the non-local variable k thus 
obtains causal effectiveness that goes beyond that of the 
local interactions that define k. It has obtained causal 
effectiveness of its own, through k. It should be noted 
that there is no conflict with causality here, because non¬ 
local spatial effectiveness has to originate from previous 
k, rather than instantaneously. 

III. DISCUSSION 

Correlation in nature usually arises from direct causal 
connections or connections with a common cause. Noise 
generally decreases such correlations over time, although 
there are exceptions (9']. The theory constructed in the 
previous section is different on both counts. First, it 
uses noise to produce rather than destroy correlations. 
Noise is essential for producing variants with a drift term 
that utilizes a correlation between k and k. Second, this 
correlation does not originate from direct causal connec¬ 
tions, but from random generation followed by elimina¬ 
tion. Systems with no or little correlation between k and 
k become extinct, leaving the ones that happen to have 
more correlation, by chance. Crucially, the system dy¬ 
namics includes multiplicative noise that is coupled to k, 
and thereby to the non-local k. 

The theoretical construction explained above requires 
a series of assumptions. Although none of these are im¬ 
plausible when taken separately, it is difficult to assess 
how probable they are in combination. Yet, it should be 
noted that the goal here was to provide a proof of con¬ 
cept. Counter-intuitively, the theory shows that causal 
non-locality can indeed arise from local causal interac¬ 
tions. It thereby shows that causal non-locality is possi¬ 
ble. 


The theory depends critically on the existence of self¬ 
replication. Self-replication is rare, but is known to exist 
in chain reactions of various kinds, in crystal growth, and 
in autocatalytic chemical processes. But self-replication 
is most commonly found in biological organisms. Indeed, 
the theory explained above resembles the Darwinian pro¬ 
cess of natural selection. Yet, it should be seen as an 
addition to that process. The regular Darwinian process 
concerns the factor n(x,t) that was deliberately set to 
zero here. That term produces a drift proportional to 
dk/dx (eq. 0. Maximizing this drift requires a n(x,t) 
that at least has the same sign as dk/dx. It would corre¬ 
spond then to a conventional hill climbing optimization. 
Suitable forms for fi{x, t) may be found by random varia¬ 
tions of systems s, as argued by Darwin. However, dk/dx 
plays no role in eq. m, not even indirectly. The term /i 
can therefore not produce a correlation between a non¬ 
local and local variable as the noise term can. Neverthe¬ 
less, /i can contribute to non-locality in an indirect way. 
When the term with /i in eq. (0 is positive, the condition 
on K (eq. 0 can be relaxed, because the system is less 
vulnerable to downward fluctuations of d k. In addition, 
the range over which z varies becomes smaller, because 
x attempts to follow E t . Then a 1 2 3 4 5 6 can be larger, which 
increases the drift term that is responsible for producing 
non-locality. 

Biological evolution is obviously much more complex 
than the mechanisms presented here. In particular, it has 
a clear separation of the timescales of hereditary change 
and behavioural change within an organism’s lifetime. 
More complex versions of the model of eq. (0) that take 
some of these elaborations into account have been evalu¬ 
ated computationally Q. Such simulations yield results 
that are consistent with those derived here more rigor¬ 
ously for a simplified system. 

Although the theory presented here is conjectural, it 
provides a plausible explanation of non-local causality. 
The correlation between k and k is then, presumably, the 
origin of all more elaborate versions of non-local causal¬ 
ity that have subsequently evolved. Examples are the 
temporal non-locality of memory (genetic, neuronal, and 
technological), the spatial non-locality of devices such as 
spider’s webs and steam engines, and, probably, even the 
human ability to produce non-local theories. 
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